{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training an agent to Walk"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=3.0>\n",
    "Now let us learn how to train a robot to walk using Gym along with some fundamentals.\n",
    "The strategy is that reward X points will be given when the robot moves forward and if the\n",
    "robot fails to move then Y points will be reduced. So the robot will learn to walk in the\n",
    "event of maximizing the reward.\n",
    "\n",
    "First, we will import the library, then we will create a simulation instance by make\n",
    "function. \n",
    "<br>\n",
    "<br>\n",
    "Open AI Gym provides an environment called BipedalWalker-v2 for training\n",
    "robotic agents in simple terrain. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import gym\n",
    "env = gym.make('BipedalWalker-v2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<font size=3.0> Then for each episode (Agent-Environment interaction between initial and final state), we\n",
    "will initialize the environment using reset method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "for episode in range(100):\n",
    "    observation = env.reset()\n",
    "    \n",
    "    # Render the environment on each step \n",
    "    for i in range(10000):\n",
    "         env.render()\n",
    "            \n",
    "        # we choose action by sampling random action from environment's action space. Every environment has\n",
    "        # some action space which contains the all possible valid actions and observations,\n",
    "        \n",
    "        action = env.action_space.sample()\n",
    "    \n",
    "        # Then for each step, we will record the observation, reward, done, info\n",
    "        observation, reward, done, info = env.step(action)\n",
    "    \n",
    "   # When done is true, we print the time steps taken for the episode and break the current episode.\n",
    "    if done:\n",
    "        print(\"{} timesteps taken for the Episode\".format(i+1))\n",
    "         break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font size=3.5> \n",
    "\n",
    "The agent will learn by trail and error and over a period of time it starts selecting actions which gives the\n",
    "maximum rewards."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:universe]",
   "language": "python",
   "name": "conda-env-universe-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
